-
Notifications
You must be signed in to change notification settings - Fork 29
Rmsnorm #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Rmsnorm #136
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds Pow and Sqrt support to the Generic target: new parsers, layer types, bindings, templates (scalar vs vector exponent dispatch), C kernel implementations for float32, a test vector, and CI entries for new tests. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (1)
Tip ✨ Issue Enrichment is now available for GitHub issues!CodeRabbit can now help you manage issues more effectively:
Disable automatic issue enrichmentTo disable automatic issue enrichment, add the following to your issue_enrichment:
auto_enrich:
enabled: falseThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
Deeploy/DeeployTypes.py (1)
327-349: Docstring matches behavior; minor cleanup possible in visited setThe updated “live ancestors” wording matches the breadth‑first walk over the alias graph and better explains what’s being checked. One small implementation nit:
visited = set(self.name)builds a set of characters rather than a set of buffer names; using{self.name}would make the intent clearer and avoid mixing types invisited, even though it doesn’t currently break correctness.TargetLibraries/Generic/src/Sqrt_fp32.c (1)
1-13: Elementwise fp32 sqrt kernel looks correctThe
Sqrt_fp32_fp32implementation is straightforward and type‑consistent withfloat32_t/int32_t, doing an elementwisesqrtfover the input range. Assumingsqrtfis declared via the transitive includes fromDeeployBasicMath.h, there are no correctness issues here.TargetLibraries/Generic/src/Pow_fp16.c (1)
1-26: Pow_fp16 implementation is correct for integer exponents; consider faster exponentiationThe kernel correctly handles zero and negative integer exponents and writes elementwise
base^exponentintodata_out. For typical small exponents this is fine, but the linearfor (j = 0; j < exp; j++)loop makes runtime proportional to |exponent|. If you expect larger exponents or care about worst‑case latency, consider switching to exponentiation‑by‑squaring on a promotedfloataccumulator for better performance and numerical behavior, while preserving thefloat16_tI/O interface.Deeploy/Targets/Generic/Layers.py (1)
230-240: PowLayer/SqrtLayer wiring is minimal and consistent with existing layersThe new
PowLayerandSqrtLayerclasses correctly follow the existing pattern of thinONNXLayerwrappers around mappers. For current usage this is sufficient. If accurate op‑count reporting or explicit broadcasting for Pow becomes important, you may later want to overridecomputeOps(e.g., proportional to tensor size) and, if needed,computeShapessimilar toAddLayer/MulLayer.Deeploy/Targets/Generic/Parsers.py (1)
1967-2001: Duplicate PowParser/SqrtParser definitions and mismatched exponent fieldThere are two separate definitions of
PowParserandSqrtParserin this file: one here and another at lines 2813–2869. The later definitions override these ones at import time, so this block is effectively dead code and also:
- Triggers lints (
PowParser/SqrtParserredefinition, undefinedConstantBufferon Line 1990).- Uses
exponent_valueinstead ofexponent, which doesn’t matchFloatPowTemplate.alignToContext, whereexponentis expected andexponent_valueis derived there.To avoid confusion and static-analysis noise, I’d consolidate to a single implementation (the newer one) and delete this earlier block entirely. A minimal fix would look like:
-class PowParser(NodeParser): - ... - - -class SqrtParser(NodeParser): - ... -leaving only the final
PowParser/SqrtParserdefinitions at the bottom of the file.Also applies to: 2003-2023
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
14-28: FloatSqrtTemplate matches kernels; consider removing unuseddata_outThe template and
alignToContextcorrectly:
- Infer
data_typefromdata_inand- Dispatch to
Sqrt_fp32_fp32/Sqrt_fp16_fp16with the right arguments.The only nit is that
data_out = ctxt.lookup(operatorRepresentation['data_out'])is never used inalignToContext; you can safely drop that line to quiet Ruff and keep the function minimal.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
.github/workflows/ci-platform-generic.yml(1 hunks)Deeploy/DeeployTypes.py(3 hunks)Deeploy/Targets/Generic/Bindings.py(2 hunks)Deeploy/Targets/Generic/Layers.py(1 hunks)Deeploy/Targets/Generic/Parsers.py(2 hunks)Deeploy/Targets/Generic/Platform.py(3 hunks)Deeploy/Targets/Generic/Templates/FloatPowTemplate.py(1 hunks)Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py(1 hunks)TargetLibraries/Generic/inc/DeeployBasicMath.h(1 hunks)TargetLibraries/Generic/inc/kernel/Pow.h(1 hunks)TargetLibraries/Generic/inc/kernel/Sqrt.h(1 hunks)TargetLibraries/Generic/inc/types.h(1 hunks)TargetLibraries/Generic/src/Pow_fp16.c(1 hunks)TargetLibraries/Generic/src/Pow_fp32.c(1 hunks)TargetLibraries/Generic/src/Sqrt_fp16.c(1 hunks)TargetLibraries/Generic/src/Sqrt_fp32.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
TargetLibraries/Generic/inc/kernel/Sqrt.h (3)
TargetLibraries/Generic/src/Sqrt_fp32.c (1)
Sqrt_fp32_fp32(9-13)DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)TargetLibraries/Generic/src/Sqrt_fp16.c (1)
Sqrt_fp16_fp16(9-13)
TargetLibraries/Generic/inc/kernel/Pow.h (2)
TargetLibraries/Generic/src/Pow_fp32.c (1)
Pow_fp32_int32_fp32(9-27)TargetLibraries/Generic/src/Pow_fp16.c (1)
Pow_fp16_int32_fp16(8-26)
Deeploy/Targets/Generic/Layers.py (1)
Deeploy/DeeployTypes.py (2)
ONNXLayer(1819-2147)NodeMapper(1660-1816)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/Targets/Snitch/Parsers.py (3)
parseNode(15-26)parseNodeCtxt(28-42)parseNodeCtxt(60-74)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (3)
Deeploy/DeeployTypes.py (2)
NetworkContext(508-1020)NodeTemplate(87-229)Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
alignToContext(14-28)Deeploy/AbstractDataTypes.py (1)
typeName(312-313)
Deeploy/Targets/Generic/Platform.py (3)
Deeploy/Targets/Generic/Layers.py (2)
PowLayer(230-233)SqrtLayer(236-239)Deeploy/Targets/Generic/Parsers.py (4)
PowParser(1967-2000)PowParser(2814-2846)SqrtParser(2003-2023)SqrtParser(2849-2869)Deeploy/DeeployTypes.py (1)
NodeMapper(1660-1816)
Deeploy/Targets/Generic/Bindings.py (2)
Deeploy/CommonExtensions/DataTypes.py (2)
float16_t(67-71)float32_t(74-78)Deeploy/DeeployTypes.py (2)
CodeTransformation(2290-2324)NodeBinding(1512-1657)
🪛 Ruff (0.14.5)
Deeploy/Targets/Generic/Parsers.py
1978-1978: Unused method argument: channels_first
(ARG002)
1990-1990: Undefined name ConstantBuffer
(F821)
1995-1996: Prefer TypeError exception for invalid type
(TRY004)
1995-1996: Avoid specifying long messages outside the exception class
(TRY003)
2014-2014: Unused method argument: channels_first
(ARG002)
2814-2814: Redefinition of unused PowParser from line 1967
(F811)
2825-2825: Unused method argument: channels_first
(ARG002)
2849-2849: Redefinition of unused SqrtParser from line 2003
(F811)
2860-2860: Unused method argument: channels_first
(ARG002)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py
19-19: Local variable data_out is assigned to but never used
Remove assignment to unused variable data_out
(F841)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py
19-19: Local variable data_out is assigned to but never used
Remove assignment to unused variable data_out
(F841)
30-30: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (9)
.github/workflows/ci-platform-generic.yml (1)
76-81: New generic-kernels tests are wired correctly in CI listThe added Pow/Sqrt/RMSNorm tests fit the existing naming pattern and placement in the float test block; no workflow logic changes. Please just confirm the test targets are defined under those exact names so CI doesn’t fail on a typo.
Deeploy/DeeployTypes.py (1)
2565-2589: _mapNode refactor cleanly decouples engine selection from layer constructionThe new
_mapNodethat iterates engines and callsengine.Mapping[node.op](node)looks correct and keepsNetworkContainer._bindLayerssimpler by returning either anONNXLayeror side‑effect‑only mappings (e.g., Constant hoisting). Theisinstance(layer, ONNXLayer)guard in_bindLayersensures only real layers are scheduled, while Constant mappings still run via their lambda. This preserves multi‑engine support and fails fast with a clearRuntimeErrorwhen no mapping exists.TargetLibraries/Generic/inc/DeeployBasicMath.h (1)
47-55: Pow/Sqrt kernel includes are correctly integratedIncluding
kernel/Pow.handkernel/Sqrt.halongside the other basic math kernels is consistent and sufficient to expose the new operations to users ofDeeployBasicMath.h.TargetLibraries/Generic/src/Sqrt_fp16.c (1)
1-13: fp16 sqrt kernel is consistent with the fp32 path
Sqrt_fp16_fp16mirrors the fp32 implementation, applyingsqrtfelementwise and relying on thefloat16_ttypedef for the actual storage type. This is a reasonable, simple implementation for FP16 support and aligns with the new type definition.TargetLibraries/Generic/inc/types.h (1)
13-21: float16_t typedef is sensible and keeps non-FP16 platforms buildingDefining
float16_tas_Float16when compiler support is detected, and otherwise aliasing it tofloat, gives the new Pow/Sqrt FP16 kernels a consistent type while preserving buildability on targets without native FP16. The surrounding comments clearly document this fallback behavior.TargetLibraries/Generic/inc/kernel/Sqrt.h (1)
1-24: Sqrt kernel header matches implementationsThe include guard, DeeployBasicMath dependency, and fp32/fp16 prototypes are consistent with the corresponding C kernels; no issues from a correctness or integration perspective.
TargetLibraries/Generic/inc/kernel/Pow.h (1)
1-25: Pow kernel header is consistent with C implementationsPrototypes and guard are well-formed and match the Pow_fp32/Pow_fp16 C kernels; nothing blocking here.
Deeploy/Targets/Generic/Bindings.py (1)
10-11: Pow/Sqrt bindings are wired consistently with templates and typesThe new
BasicPowBindings/BasicSqrtBindingscorrectly:
- Use
float32_t/float16_tpointer types for inputs/outputs.- Bind to
FloatPowTemplate.referenceTemplateandFloatSqrtTemplate.referenceTemplate.- Reuse
DummyCheckerandBasicTransformerin line with nearby float ops.Once the Pow parser/template exponent checks are tightened as discussed, these bindings look sound.
Also applies to: 18-22, 121-133
Deeploy/Targets/Generic/Platform.py (1)
10-17: Pow/Sqrt integration into Generic platform is coherentThe new imports,
PowMapper/SqrtMapperdefinitions, and'Pow'/'Sqrt'entries inGenericMappingline up correctly with:
BasicPowBindings/BasicSqrtBindings,PowLayer/SqrtLayer, and- The Pow/Sqrt kernels exposed via
DeeployBasicMath.h.Assuming
DeeployBasicMath.hnow includes the newkernel/Pow.handkernel/Sqrt.h, the end‑to‑end wiring looks correct.Please double‑check that
DeeployBasicMath.hactually includes the new Pow/Sqrt kernel headers so generated code has the necessary prototypes.Also applies to: 20-22, 27-29, 56-57, 104-105
The current implementation does not support true FP16 arithmetic. Instead, data is cast to FP32 internally. These tests are being removed to avoid misleading results until native half-precision support is implemented.
diaconuccalin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job overall, but there are some changes that we need to address.
Most of them concern the following:
- Remove integer exponent enforcement for Pow
- Remove constant exponent enforcement for Pow
- Remove all traces of FP16 version for Generic, since the compiler for this platform doesn't support this format (as we talked privately, we will use it directly in Snitch, since here it would only help us create the proper infrastructure, like binding and parser, but we've already done it with FP32)
This commit addresses code review feedback: - Refactor Pow kernel to use 'powf' from math.h to support floating-point exponents. - Update PowParser to allow tensor exponents instead of forcing constants. - Remove Generic FP16 support and revert types.h changes. - Remove duplicate PowParser/SqrtParser classes. - Enhance RMSNorm tests with larger shapes and non-trivial weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
12-13: Remove unused variable lookup.The
data_outvariable is looked up but never used inalignToContext. Since the template only needsdata_into infer the data type and compute size, you can safely remove this line.Apply this diff:
# Get input and output tensors data_in = ctxt.lookup(operatorRepresentation['data_in']) - data_out = ctxt.lookup(operatorRepresentation['data_out']) # Get data type (fp32)Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
12-14: Remove unused variable lookup.The
data_outvariable is looked up but never used inalignToContext. The method only needsdata_inandexponentto infer the data type and compute sizes.Apply this diff:
# Get input and output tensors data_in = ctxt.lookup(operatorRepresentation['data_in']) exponent = ctxt.lookup(operatorRepresentation['exponent']) - data_out = ctxt.lookup(operatorRepresentation['data_out']) # Get data type (fp32)Deeploy/Targets/Generic/Bindings.py (1)
121-129: Consider more specific type checkers for Pow and Sqrt.The bindings use
DummyCheckerwhich provides minimal type validation. While this may be intentional for flexibility, you might want to define dedicatedPowCheckerandSqrtCheckerclasses (similar toAddChecker,MulChecker, etc.) to provide more specific type validation for these operations.This can be deferred if the current approach aligns with the project's type-checking strategy. The bindings are otherwise correctly structured.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/ci-platform-generic.yml(1 hunks)Deeploy/Targets/Generic/Bindings.py(2 hunks)Deeploy/Targets/Generic/Parsers.py(3 hunks)Deeploy/Targets/Generic/Templates/FloatPowTemplate.py(1 hunks)Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py(1 hunks)TargetLibraries/Generic/inc/kernel/Pow.h(1 hunks)TargetLibraries/Generic/inc/kernel/Sqrt.h(1 hunks)TargetLibraries/Generic/src/Pow_fp32.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/ci-platform-generic.yml
🧰 Additional context used
🧬 Code graph analysis (5)
TargetLibraries/Generic/inc/kernel/Sqrt.h (2)
TargetLibraries/Generic/src/Sqrt_fp32.c (1)
Sqrt_fp32_fp32(9-13)DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)
Deeploy/Targets/Generic/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (1)
float32_t(74-78)Deeploy/DeeployTypes.py (2)
CodeTransformation(2290-2324)NodeBinding(1512-1657)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
TargetLibraries/Generic/inc/kernel/Pow.h (2)
TargetLibraries/Generic/src/Pow_fp32.c (2)
Pow_fp32_fp32_fp32(10-17)Pow_fp32_scalar_fp32(19-26)DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (7)
NetworkContext(508-1020)NodeParser(1023-1198)VariableBuffer(232-360)ConstantBuffer(393-430)parseNode(1033-1048)inputs(2503-2520)parseNodeCtxt(1051-1076)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
alignToContext(9-34)
🪛 Ruff (0.14.6)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py
14-14: Local variable data_out is assigned to but never used
Remove assignment to unused variable data_out
(F841)
Deeploy/Targets/Generic/Parsers.py
1978-1978: Unused method argument: channels_first
(ARG002)
1995-1996: Prefer TypeError exception for invalid type
(TRY004)
1995-1996: Avoid specifying long messages outside the exception class
(TRY003)
2799-2799: Unused method argument: channels_first
(ARG002)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py
13-13: Local variable data_out is assigned to but never used
Remove assignment to unused variable data_out
(F841)
🔇 Additional comments (6)
TargetLibraries/Generic/inc/kernel/Sqrt.h (1)
20-20: LGTM!The function signature is correct for an element-wise square root operation. The naming convention follows the pattern seen in other kernels and the parameters are appropriate.
TargetLibraries/Generic/inc/kernel/Pow.h (1)
16-24: LGTM!Both function signatures correctly use
float32_tfor the exponent parameter(s), which allows the kernels to support general floating-point exponents viapowf. The const and restrict qualifiers are appropriate.TargetLibraries/Generic/src/Pow_fp32.c (1)
10-26: LGTM!Both kernel implementations correctly use
powfwhich supports general floating-point exponents. The array-based and scalar-based variants are implemented appropriately for broadcasting scenarios.Deeploy/Targets/Generic/Parsers.py (2)
2788-2808: LGTM!The
SqrtParserimplementation is straightforward and correct for a unary square root operation. It properly extracts the input/output tensors and computes the size.Note: The
channels_firstparameter is unused (flagged by static analysis), but this is likely required by theNodeParserinterface.
1990-1996: Incorrect exponent handling: casting to int loses precision and enforcing constants limits functionality.There are two critical issues here:
Integer casting loses precision: Line 1991 casts the exponent to
int, but the C kernelPow_fp32_fp32_fp32andPow_fp32_scalar_fp32usepowfwhich supports floating-point exponents. For example, an exponent of2.5would be silently truncated to2, producing incorrect results.Constant enforcement is too restrictive: Lines 1994-1996 reject non-constant (variable tensor) exponents, but this unnecessarily limits the operator's functionality. Per past review feedback and the ONNX Pow specification, variable exponents should be supported.
Apply this diff to support float exponents and remove constant enforcement:
# Extract exponent value from the constant tensor if isinstance(exponent_tensor, ConstantBuffer): - exp_value = int(exponent_tensor.values.flatten()[0]) - self.operatorRepresentation['exponent_value'] = exp_value - else: - # Tensor exponent not supported - raise ValueError(f"Node {node.name}: Exponent must be a constant. " - f"Variable tensor exponents are not supported.") + exp_value = float(exponent_tensor.values.flatten()[0]) + self.operatorRepresentation['exponent_value'] = exp_value + # Variable tensor exponents are now supported via the array-based kernelBased on learnings from past reviews requesting float exponent support and removal of constant enforcement.
Likely an incorrect or invalid review comment.
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
25-34: LGTM!The scalar broadcasting logic is well-implemented. The template correctly distinguishes between scalar and array exponents, selecting the appropriate kernel (
Pow_fp32_scalar_fp32vsPow_fp32_fp32_fp32) and constructing the proper variable reference for scalar exponents.
diaconuccalin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress and a good idea with the scalar/non-scalar exponent differentiation! I've left a few more improvement suggestions.
Beside the comments below, don't forget to run make format, so we get all green tests + update the CHANGELOG.md file as well (it is in the root folder). You should add the information you already have in your PR description to the top of each corresponding list in the changelog file.
| #include "DeeployBasicMath.h" | ||
| #include <math.h> | ||
|
|
||
| void Pow_fp32_fp32_fp32(const float32_t *__restrict__ data_in, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only have a test for the scalar situation. Please add one for vector exponents, while keeping the scalar one (rename existent one if it makes more sense + add the new one to the CI pipeline asa well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
13-29: Approve dynamic typing; optional cleanup for unused variable.The template correctly uses dynamic
type_widthextraction (data_in._type.referencedType.typeWidth) and applies it in the kernel call (Sqrt_fp${type_width}_fp${type_width}), following best practices from learnings to avoid hardcoded types.Optionally, you can remove the unused
data_outlookup at line 17 to clean up the code, though its presence is harmless:- data_out = ctxt.lookup(operatorRepresentation['data_out']) -Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
13-49: LGTM! Size validation and dynamic typing correctly implemented.The template properly:
- Uses dynamic
type_widthextraction following best practices.- Implements scalar vs. vector dispatch logic based on exponent shape.
- Validates
input_size == exponent_sizefor non-scalar exponents (line 42-44), directly addressing previous reviewer feedback.Optionally, you can remove the unused
data_outlookup at line 18:- data_out = ctxt.lookup(operatorRepresentation['data_out']) -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/workflows/ci-platform-generic.yml(1 hunks).gitignore(1 hunks)Deeploy/Targets/Generic/Bindings.py(2 hunks)Deeploy/Targets/Generic/Parsers.py(3 hunks)Deeploy/Targets/Generic/Templates/FloatPowTemplate.py(1 hunks)Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py(1 hunks)DeeployTest/Tests/testFloatPowVector/network.onnx(1 hunks)TargetLibraries/Generic/inc/kernel/Pow.h(1 hunks)TargetLibraries/Generic/src/Pow_fp32.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/ci-platform-generic.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-02T13:54:22.700Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.
Applied to files:
Deeploy/Targets/Generic/Templates/FloatPowTemplate.pyTargetLibraries/Generic/src/Pow_fp32.cDeeploy/Targets/Generic/Bindings.pyDeeploy/Targets/Generic/Templates/FloatSqrtTemplate.py
🧬 Code graph analysis (3)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (2)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
alignToContext(13-29)Deeploy/AbstractDataTypes.py (2)
typeName(312-313)typeWidth(399-400)
TargetLibraries/Generic/inc/kernel/Pow.h (1)
TargetLibraries/Generic/src/Pow_fp32.c (2)
Pow_fp32_fp32_fp32(10-16)Pow_fp32_scalar_fp32(18-24)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (4)
ConstantBuffer(393-430)NetworkContext(508-1020)VariableBuffer(232-360)lookup(720-752)
🪛 Ruff (0.14.7)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py
18-18: Local variable data_out is assigned to but never used
Remove assignment to unused variable data_out
(F841)
43-44: Avoid specifying long messages outside the exception class
(TRY003)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py
17-17: Local variable data_out is assigned to but never used
Remove assignment to unused variable data_out
(F841)
Deeploy/Targets/Generic/Parsers.py
1978-1978: Unused method argument: channels_first
(ARG002)
2789-2789: Unused method argument: channels_first
(ARG002)
🔇 Additional comments (7)
.gitignore (1)
27-27: LGTM!Adding
.venv/*to the ignore list is a sensible best practice for Python projects and aligns with the introduction of Python-based build tooling and tests in this PR.Deeploy/Targets/Generic/Parsers.py (2)
1967-1991: LGTM! Exponent handling is now flexible.The PowParser correctly treats the exponent as a general tensor input without enforcing it to be a constant, aligning with the ONNX Pow specification and previous reviewer feedback. The scalar vs. vector dispatch is appropriately delegated to the template layer.
2778-2798: LGTM! SqrtParser implementation is correct.The parser correctly extracts input, output, and size information for the Sqrt operation. The unused
channels_firstparameter is acceptable as it conforms to theNodeParser.parseNodeCtxtinterface signature, which is used consistently across all parsers in this file.TargetLibraries/Generic/inc/kernel/Pow.h (1)
16-22: LGTM! Dual kernel variants provide flexibility.The header correctly declares both vector (
Pow_fp32_fp32_fp32) and scalar (Pow_fp32_scalar_fp32) exponent variants, enabling efficient dispatch based on exponent shape. The signatures use proper const-correctness and restrict qualifiers.TargetLibraries/Generic/src/Pow_fp32.c (1)
10-24: LGTM! Implementations correctly use powf for full float exponent support.Both kernel variants properly leverage
powffrommath.h, which supports arbitrary floating-point exponents including fractional and negative values. This ensures full ONNX Pow semantics compliance and addresses previous concerns about integer-only exponents.Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
52-59: LGTM! Kernel dispatch correctly implemented.The reference template properly dispatches between scalar and vector Pow kernels based on the
is_scalarflag, with consistent use of dynamic type widths (${type_width}) throughout both branches.Deeploy/Targets/Generic/Bindings.py (1)
121-129: LGTM! Bindings correctly wire Pow and Sqrt operations.The new
BasicPowBindingsandBasicSqrtBindingsproperly integrate the Pow and Sqrt templates into the platform. The use ofDummyCheckerwith appropriate float32 pointer types is consistent with similar operations in this file, and both bindings correctly reference their respective templates and apply theBasicTransformer.
diaconuccalin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, good effort! :) I've looked through the most recent changes and everything seems fine, we will try to merge it as soon as possible.
Describe the intent of your PR here.
This PR adds support for RMSNorm (Root Mean Square Normalization) operation to the Deeploy framework's Generic platform. RMSNorm is a critical normalization technique used in modern Transformer architectures and large language models. To enable RMSNorm deployment on embedded systems, this PR implements the necessary mathematical primitives (Pow and Sqrt operations) and integrates them into Deeploy's compilation pipeline.
The implementation follows Deeploy's operator decomposition approach, where RMSNorm is constructed from basic mathematical operations rather than as a monolithic kernel. This design provides flexibility and maintainability while supporting both float32 and float16 precision for resource-constrained embedded devices.
Added
Pow (Power) operation support
FloatPowTemplate.py: Mako template for C code generationPow_fp32.cKernel implementations for both precisionskernel/Pow.h: Kernel interface definitionsSqrt (Square Root) operation support
FloatSqrtTemplate.py: Mako template for C code generationSqrt_fp32.c: Kernel implementationskernel/Sqrt.h: Kernel interface definitionsComprehensive test suites
testFloatPow: Pow operator tests with ONNX models and reference datatestFloatSqrt: Sqrt operator teststestFloatRMSNorm: End-to-end RMSNorm tests demonstrating operator compositionChanged
Framework integration files
Deeploy/Targets/Generic/Parsers.py: Added PowParser and SqrtParser for ONNX graph parsingDeeploy/Targets/Generic/Layers.py: Added corresponding Layer classes for both operationsDeeploy/Targets/Generic/Bindings.py: Added type checking and binding registrationDeeploy/Targets/Generic/Platform.py: Registered new operations in platform mappingRuntime library headers
TargetLibraries/Generic/inc/DeeployBasicMath.h: Extended with Pow and Sqrt function declarationsTargetLibraries/Generic/inc/types.h: Updated type definitions for consistencyCI/CD configuration
.github/workflows/ci-platform-generic.yml: Updated to include new test cases in automated testing pipelineFixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.